Cross-Language Information Retrieval based on category matching between language versions of a web directory

نویسندگان

  • Fuminori Kimura
  • Akira Maeda
  • Masatoshi Yoshikawa
  • Shunsuke Uemura
چکیده

Since the Web consists of documents in various domains or genres, the method for Cross-Language Information Retrieval (CLIR) of Web documents should be independent of a particular domain. In this paper, we propose a CLIR method which employs a Web directory provided in multiple language versions (such as Yahoo!). In the proposed method, feature terms are first extracted from Web documents for each category in the source and the target languages. Then, one or more corresponding categories in another language are determined beforehand by comparing similarities between categories across languages. Using these category pairs, we intend to resolve ambiguities of simple dictionary translation by narrowing the categories to be retrieved in the target language.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CLIR Using Web Directory at NTCIR-4

In this paper, we propose a CLIR method which employs a Web directory provided in multiple language versions (such as Yahoo!). In the proposed method, feature terms are first extracted from Web documents for each category in the source and the target languages. In advance, category matching is conducted in order to category pairs between categories across languages. Using these category pairs, ...

متن کامل

Cross-Language Information Retrieval Based on Category Matching of Web Directories

With the popularity of the Internet, more and more languages are becoming to be used for Web documents. Accordingly, Cross-Language Information Retrieval (CLIR), a method to retrieve documents written in one or more languages using a query written in another language, has been actively studied. A variety of methods, including employing corpus statistics for translation of terms and disambiguati...

متن کامل

Analysis of Appropriate Category Level of Web Directory for Cross-Language Information Retrieval

In this paper, we analyzed appropriate category level of Web directory for Cross-Language Information Retrieval (CLIR). Our proposed method for CLIR is based on estimating domains of the query using hierarchic structures of Web directories. Therefore, it is necessary for correct domain estimation to detect appropriate category level of Web directory. We conducted experiments of retrieval using ...

متن کامل

Impact of Controlled and Free Language Use in Retrieving Articles from the ProQuest and Science Direct Databases

Abstract Introduction: The growth and expansion of the Internet has changed the way information is accessed and many facilities have been created on the Web to facilitate and expedite information locating. Objective: To identify the impact of keyword documentation using the medical thesaurus on the retrieval of articles from Proquest and Science Direct databases. Materials and Methods:The pr...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003